Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(restore): adds --dc-mapping flag to restore command #4213

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

VAveryanov8
Copy link
Collaborator

@VAveryanov8 VAveryanov8 commented Jan 16, 2025

This adds support for --dc-mapping flag to restore command. It specifies mapping between DCs from the backup and DCs in the restored(target) cluster. Only 1 use case is supported: 1-1 dc mapping. This means that squeezing (restore dc1 and dc2 into dc3) or extending (restore dc1 into dc1 and dc2) DCs is not supported when --dc-mapping is provided.
So the syntax is:

source_dc1=target_dc1,source_dc2=target_dc2
Where
     equal(=) is used to separate source   dc name and target dc name
     comma(,)  is used to separate multiple mappings

If --dc-mapping is not provided, then current behavior should be preserved - each node with access to DC can download it data. Also it's allowed to provide only subset of DCs, ignoring source dc or target (or both).
Only works with tables restoration (--restore-tables=true).

Fixes: #3829


Please make sure that:

  • Code is split to commits that address a single change
  • Commit messages are informative
  • Commit titles have module prefix
  • Commit titles have issue nr. suffix

@VAveryanov8 VAveryanov8 marked this pull request as ready for review January 17, 2025 08:23
@Michal-Leszczynski
Copy link
Collaborator

Michal-Leszczynski commented Jan 17, 2025

In terms of syntax:

"dc1,!dc2=>dc2" - data from dc1 should be restored to dc2 DC. Ignoring dc2 from source cluster.

I find it confusing that it's possible to specify both restored and skipped DCs in the same mapping key.
It reads like "restore dc1 and not dc2 into dc2" instead of "don't restore dc2 and restore dc1 into dc2".
IMHO it should look like:

 "!dc2;dc1=>dc2"     - data from dc1 should be restored to dc2 DC. Ignoring dc2 from source cluster.

I would suggest to validate that positive and negative DC occurrences are not mixed in the mapping key and that negative DCs can't be mapped to anything, so that we avoid confusion and typing mistakes.

EDIT: The same goes here:

"dc1,dc2=>dc3,!dc4" - data from dc1 and dc2 DCs should be restored to dc3 DC. Ignoring dc4 DC from target cluster.

The negative DC is placed in the mapping value alongside positive DC?
Now I see that we can't simply allow to write !dc1 because it's ambiguous to whether it's a source or destination DC.

But I'm still in favor of not mixing positive and negative DCs here.
I could look like that:

> "=>!dc4;dc1,dc2=>dc3" - data from dc1 and dc2 DCs should be restored to dc3 DC. Ignoring dc4 DC from target cluster.

On the other hand, if we allow no DCs on either side of => then the ! is not needed anymore, but it's still nice for visibility.
What do you think about removing the ! in front of DC from the syntax and incorporating it into the =>? Something like:

"dc1!=>dc2,dc3;dc2=>dc1"- we want to skip dc1 in src, dc2 and dc3 in dst, and restore from dc2 in src into dc1 in dst

Alternative approach would be to add some special characters for distinguishing between src and dst DCs, but I guess that the relation to the => already takes care of that.

pkg/command/restore/res.yaml Show resolved Hide resolved
pkg/command/restore/dcmappings_test.go Outdated Show resolved Hide resolved
pkg/command/restore/dcmappings_test.go Outdated Show resolved Hide resolved
pkg/command/restore/cmd.go Outdated Show resolved Hide resolved
pkg/service/restore/batch.go Outdated Show resolved Hide resolved
pkg/service/restore/model.go Outdated Show resolved Hide resolved
pkg/service/restore/worker.go Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
@Michal-Leszczynski
Copy link
Collaborator

@VAveryanov8 I made a general review, but I didn't dive into other details, because it would be better to first discuss the current comments (other things might not matter after that).

@Michal-Leszczynski
Copy link
Collaborator

Also, we should deprecate the datacenter from the --location flag, as it was previously used to partially control the dc mapping. When dc mapping flag is present, it should be ignored. When it's not, it can still be respected, but in the implementation we would still just parse it into dc mapping on the SM side.

Copy link
Collaborator

@karol-kokoszka karol-kokoszka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VAveryanov8 Thanks for the extensive PR and for covering a lot of different scenarios, but we don't need such a complex logic.

It's enough to do the one to one mapping. Instead of allowing to merge multiple DCs in source or multiple DCs in target.

Let's discuss the goal of DC mapping on the call.

It's enough to have a possibility of:

  • restoring just a single DC (that's why I don't really see the reason for introducing another flag that skips validation)
  • explicitly defining that nodes from target DCX are expected to download data of source DCY and call to load & stream

return nil, errors.Wrap(err, "get status")
}

sourceMap, targetMap := target.DCMappings.calculateMappings()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: sourceMap => sourceDC2TargetDCMap , targetMap => targetDC2SourceDCMap


type dcMapping struct {
Source []string `json:"source"`
Target []string `json:"target"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DC mapping should be single string to single string.
DC1 (of the source cluster) => DC2 (of the destination cluster) ; DC2 (of the source) => DC1 (of the target)

And that's it, no need for the complex logic.

@karol-kokoszka
Copy link
Collaborator

@VAveryanov8 I see some recent pushes to this PR, is it ready for re-review ?

@VAveryanov8
Copy link
Collaborator Author

@VAveryanov8 I see some recent pushes to this PR, is it ready for re-review ?

Yes, it's ready for re-review

Copy link
Collaborator

@karol-kokoszka karol-kokoszka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !

@mikliapko There are SCT tests that were trying to restore multiDC cluster, but they were failing due to the problems with encryption at rest enabled on the cluster.
Two datacenters were using different encryption keys.

This PR is expected to fix this problem. AFAIR, you changed these tests to singleDC later. Is it possible to validate this PR against the SCT tests with the multiDC restore with encryption at rest enabled ?

@karol-kokoszka
Copy link
Collaborator

@VAveryanov8 Please rebase on master. You will have conflicts related to the new submodule, but they are suppose to be very easy to fix.

@mikliapko
Copy link

@mikliapko There are SCT tests that were trying to restore multiDC cluster, but they were failing due to the problems with encryption at rest enabled on the cluster. Two datacenters were using different encryption keys.

Yep, we can do it, just give me the SM build that can be used

@karol-kokoszka
Copy link
Collaborator

@mikliapko Here is the manager-build triggered on the current branch https://jenkins.scylladb.com/view/scylla-manager/job/manager-master/job/manager-build/919/

You will have to edit the test to include --dc-mapping source_dc1=>target_dc1;source_dc2=>target_dc2 into the sctool restore call.

This adds support for --dc-mapping flag to restore command.
It specifies mapping between DCs from the backup and DCs in the restored(target) cluster.
All DCs from the source cluster should be explicitly mapped to all DCs in the target cluster. The only exception is when
source and target cluster has exact match: source dcs == target dcs.
Only works with tables restoration (--restore-tables=true).
Syntax:
    "source_dc1,source_dc2=>target_dc1,target_dc2"
Multiple mappings are separated by semicolons (;). Exclamation mark (!) before a DC indicates that it should be ignored during restore.
Examples:
    "dc1,dc2=>dc3"      - data from dc1 and dc2 DCs should be restored to dc3 DC.
    "dc1,dc2=>dc3,!dc4" - data from dc1 and dc2 DCs should be restored to dc3 DC. Ignoring dc4 DC from target cluster.
    "dc1,!dc2=>dc2"     - data from dc1 should be restored to dc2 DC. Ignoring dc2 from source cluster.

Fixes: #3829
This introduces use of dc mappings when restoring tables.
Now each dc is downloading only data from corresponding dc(s)
accordingly to user provided mapping.
Also some dcs can be explicitly ignored.

Fixes: #3829
This adds another cluster to docker setup, so we can have integration
tests for dc-mappings.

Fixes: #3829
This removes support for !dc syntax and introduces new
--skip-dc-mapping-validation flag which can be used when partial restore
is needed.
This removes third cluster, but adds another data center to the second
cluster.
This introduces LocationInfo which is a direct replacement of
locationHosts, but it contains more information about Location, like
what DC are actually stored in this Location, what hosts can access it
and the list of manifests from this location. Also LocationInfo is
created with the respect of dc mappings.
This simplifies `--dc-mapping` usage only for 1 use case: 1-1
dc mapping. This means that squeezing (restore dc1 and dc2 into dc3) or
extending (restore dc1 into dc1 and dc2) DCs is not supported when
`--dc-mapping` is provided.
So the syntax is also simplified:
```
source_dc1=>target_dc1;source_dc2=>target_dc2
Where
     => is used to separate source   dc name and target dc name
     ;  is used to separate multiple mappings
```
If `--dc-mapping` is not provided, then current behavior should be
preserved - each node with access to DC can download it data.
Also it's allowed to provide only subset of DCs, ignoring source dc or
target (or both).
This removed dc3 from second cluster as `--dc-mappings` was simplified
and there is no necessity in having cluster with another dc name.
As this pr does not introduce any breaking changes anymore, integration
tests can be simplified as well.
This makes locationInfo.AllHosts to return only uniq hosts. Also some
small fixes in tests and in error messages.
This fixes the behavior when dc-mapping is not set.
pkg/service/restore/model.go Outdated Show resolved Hide resolved
pkg/command/restore/dcmappings.go Outdated Show resolved Hide resolved
pkg/service/restore/batch_test.go Outdated Show resolved Hide resolved
testing/nodes_exec Outdated Show resolved Hide resolved
pkg/service/restore/restore_integration_test.go Outdated Show resolved Hide resolved
pkg/service/restore/worker.go Outdated Show resolved Hide resolved
pkg/service/restore/worker.go Outdated Show resolved Hide resolved
pkg/service/restore/worker.go Outdated Show resolved Hide resolved
pkg/service/restore/worker_test.go Outdated Show resolved Hide resolved
pkg/service/restore/worker_test.go Outdated Show resolved Hide resolved
@mikliapko
Copy link

@mikliapko Here is the manager-build triggered on the current branch https://jenkins.scylladb.com/view/scylla-manager/job/manager-master/job/manager-build/919/

You will have to edit the test to include --dc-mapping source_dc1=>target_dc1;source_dc2=>target_dc2 into the sctool restore call.

Testing with the provided build.

To make it work, dc mapping should be taken in quotes (see examples below):

ubuntu@manager-regression-adapt-fo-monitor-node-55ef36da-1:~$ sctool restore -c test --restore-tables --location s3:manager-backup-tests-us-east-1 --snapshot-tag sm_20250210120404UTC --dc-mapping eu-west-2scylla_node_west=>eu-west-2scylla_node_west;us-eastscylla_node_east=>us-eastscylla_node_east
Error: invalid argument "eu-west-2scylla_node_west=" for "--dc-mapping" flag: invalid syntax, mapping should be in a format of sourceDcs=>targetDcs, but got: eu-west-2scylla_node_west=

-bash: us-eastscylla_node_east=: command not found
ubuntu@manager-regression-adapt-fo-monitor-node-55ef36da-1:~$ sctool restore -c test --restore-tables --location s3:manager-backup-tests-us-east-1 --snapshot-tag sm_20250210120404UTC --dc-mapping "eu-west-2scylla_node_west=>eu-west-2scylla_node_west;us-eastscylla_node_east=>us-eastscylla_node_east"
restore/e3cc3b69-0c48-4109-ad78-fbe6257f6eb9

@VAveryanov8 @karol-kokoszka Is it inevitable in this case?

@VAveryanov8
Copy link
Collaborator Author

VAveryanov8 commented Feb 10, 2025

To make it work, dc mapping should be taken in quotes (see examples below):

For this particular syntax yes, quotes are required.

But the good news is that I'm gonna get rid of it towards more simple syntax using just equal(=) instead of array(=>) and comma(,) instead of semi-column(;). Then I suppose it will be possible to use without quotes.

@mikliapko
Copy link

To make it work, dc mapping should be taken in quotes (see examples below):

For this particular syntax yes, quotes are required.

But the good news is that I'm gonna get rid of it towards more simple syntax using just equal(=) instead of array(=>) and comma(,) instead of semi-column(;). Then I suppose it will be possible to use without quotes.

Great, please, let me know when you will have the build with new implementation

mikliapko added a commit to mikliapko/scylla-cluster-tests that referenced this pull request Feb 10, 2025
To run a restore task for multiDC cluster with EaR enabled (otherwise
fails #1), the special flag has been introduced in Manager (#2) to map
backed up DC with DC under restore, for example,

sctool restore ... --dc-mapping dc1=dc1,dc2=dc2

The change introduces this new flag into `create_restore_task` method
and makes sure that if Scylla cluster has more than 1 datacenter - the
restore task will be triggered with this flag applied.

refs:
1. scylladb/scylla-manager#3871
2. scylladb/scylla-manager#4213
This simplifies syntax of dc-mapping by leveraging pflag.StringToString
functionality.
Also extends integration tests with validation that each node should
download only tables from corresponding DCs when dc-mapping is provided.
Copy link
Collaborator

@Michal-Leszczynski Michal-Leszczynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks nice!
Just some question about the comment.

pkg/service/restore/worker.go Outdated Show resolved Hide resolved
@karol-kokoszka
Copy link
Collaborator

@mikliapko Here is the latest build from this branch https://jenkins.scylladb.com/view/scylla-manager/job/manager-master/job/manager-build/923/
It contains all recent changes.

mikliapko added a commit to mikliapko/scylla-cluster-tests that referenced this pull request Feb 12, 2025
To run a restore task for multiDC cluster with EaR enabled (otherwise
fails #1), the special flag has been introduced in Manager (#2) to map
backed up DC with DC under restore, for example,

sctool restore ... --dc-mapping dc1=dc1,dc2=dc2

The change introduces this new flag into `create_restore_task` method
and makes sure that if Scylla cluster has more than 1 datacenter - the
restore task will be triggered with this flag applied.

refs:
1. scylladb/scylla-manager#3871
2. scylladb/scylla-manager#4213
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Give possibility for restoring DC using mapping sourceDC -> destinationDC
4 participants